Skip to main content

View Post [edit]

Poster: bgb Date: May 12, 2004 9:13am
Forum: petabox Subject: Re: Virtual Disk Software?

The 1U nodes are now connected by 100 Mbit ethernet. There is no significant use of NFS or a net filesystem for the bulk of Archive web data. No FC or SCSI -- this is all commodity parallel ATA/IDE disk. Simple and cheap. There is high level software for mirroring the archive files to exist on two different nodes.

Reply [edit]

Poster: mbuechler Date: May 12, 2004 9:42am
Forum: petabox Subject: Re: Virtual Disk Software?

My question was concerning how external computers connected to the unit. How would say a server connect to this storage unit - through a network service, though a SAN connection or something entirely different?

Reply [edit]

Poster: Brak Date: May 12, 2004 11:39am
Forum: petabox Subject: Re: Virtual Disk Software?

The software which runs on the petabox is arbitrary. It could be Linux, it could be (gasp!), actually, I won't even say it.

I believe the main point of the petabox is the thought of providing a petabyte of available storage. What you do with 800 nodes and 3200 disks is up to you, really.

The current design and implementation is designed to be very space and power and cost efficient, both in deployment and ongoing operation to store X amount of data. Power and cooling are huge considerations when you are building something of this scale.

Storing data is the main purpose. The most compelling reason for the Internet Archive not virtualizing the storage across the cluster with RAID, GFS, or Lustre is a design decision grounded in what should be the mission of any archive. Availability of as much of the data for as long a period as possible. Having multiple petaboxes spread around the world with the same data is one way of ensuring preservation. Perhaps more importanly, if at some arbitrary point in the future, a single hard drive is dug out of the side of a hill where a digital archive used to stand, there is at least some hope of getting some part of the data off of the drive in a useful form.

A drive that was part of some virtualized SAN with GFS and RAID would be more or less useless to a digital archeologist. However, a drive with actual files has some hope of providing useful data, music, books or other information.

Virtualization could certainly be done with this hardware for other applications, but perhaps different decisions would be made regarding switching fabric and cpu power.

This post was modified by Brak on 2004-05-12 18:39:42

Reply [edit]

Poster: billmoyer Date: May 12, 2004 11:25am
Forum: petabox Subject: Re: Virtual Disk Software?

The PetaBox is visible to the outside world as a collection of IP addresses, each corresponding to a homeserver or storage node. They respond to http, ftp, ssh, and rsync requests over TCP/IP to make their data available. The Internet Archive has web-based software which attempts to make this access more transparent at the top level, but as of yet this software is fairly specific to the Archive's application. There is other software which can be used more generically to map named data items to the storage nodes which contain them. The homeserver node can be used as the entry point to finding any named item on the cluster, using this software via common gateway interface.

If you are asking about the physical link layer, it is talking to the outside world via gigabit ethernet, over which IP packets are transmitted.

-- TTK

Reply [edit]

Poster: mbuechler Date: May 12, 2004 11:37am
Forum: petabox Subject: Re: Virtual Disk Software?

I had high hopes that someone had finally come up with a usable Linux SCSI target solution. I am very interested in SCSI target support in Linux or BSD for use in a home-grown fibre SAN accessable storage arrays.

Thanks for your explanation.